Optimal Feature Selection

Automatic selection of optimal features from the noise

Linear models struggle with high-dimensional noisy data

Linear models such as linear and logistic regression are some of the most well-studied and understood predictive models. However, they are not well-suited to large numbers of features, and it is hard to know which set of features give the most significant and predictive model.

Finding the perfect subset is unfortunately an NP hard problem, making it computationally prohibitive for problems of practical sizes. There exist heuristic approaches for selecting features such as forward and backward stepwise variable selection, as well as regularization methods like Lasso and elastic net. However, experiments with both synthetic and real data show that these approximation methods such as Lasso fail to find the exact solution, and as a result they often find many extra false features.

True variable selection with modern optimization

Harnessing the power of modern optimization, Optimal Feature Selection is the first known method that offers exact yet computationally practical way to find the set of optimal features.

Compared to LASSO, Optimal Feature Selection reduces falsely selected features by a half while maintaining same level of true positive rates. This means it is more efficient in choosing the right variables, and that the resulting model is simplier, more interpretable, and more accurate.

Optimal Feature Selection has highest accuracy and lower false alarm compared to Lasso in an experiment with 2000 features and only 10 of which are relevant.

High interpretability and accuracy with a fraction of selected features

We applied Optimal Feature Selection in a variety of cases. In an automotive testing setting, Optimal Feature Selection reaches the level of best performance with 8 features, whereas elastic net (the best-in-class method) needs close to 80 to reach the same level of performance.

As the goal of this application was not only to predict accurately but also to learn the physical mechanism that affects the outcome, the simpler model stands out even more, as it offers a practical small set of best variables for the engineers to further investigation and make changes to improve outcome.

Read the case

Comparing the performance over different number of features selected between Optimal Feature Selection and Elastic Net

Extremely fast and scalable

Optimal Feature Selection is offered in two flavors: the exact method as well as heuristic solutions. Both scale well to large datasets as a result of progress made in modern optimization over the past decades.

In particular, for industry scale datasets with over millions of observations and thousands of features, the heuristic option can find optimal features in just matter of seconds. In addition, experiments show that in the majority of cases studied, the heuristic options in fact find the exact solution, therefore no tradeoff is needed when the data scale is extremely large.

Computational time comparisons. Both exact and approximate methods remain tractable as the number of samples increases.

Selected cases using Optimal Feature Selection

Optimizing Data Acquisition

Finance & Insurance

Identifying the right data to acquire using Optimal Feature Selection

Optimal Experiment Design for Automotive Testing

Manufacturing

Feature selection to solve the curse of dimensionality in configuration testing

Related publications

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas and Bart Van Parys

The Annals of Statistics, 2020

The original publication by our co-founder pioneering Optimal Feature Selection. This paper presents the first scalable approach to exact subset selection in the linear regression problem, and demonstrates superior empirical results compared to existing heuristic approaches.

Sparse Regression: Scalable algorithms and empirical performance

Dimitris Bertsimas, Jean Pauphilet and Bart Van Parys

Statistical Science, 2020

This paper develops an extremely scalable heuristic for solving the Optimal Feature Selection problem that is significantly faster than the original approach, without sacrificing performance.

Sparse classification and phase transitions: A discrete optimization perspective

Dimitris Bertsimas, Jean Pauphilet and Bart Van Parys

Preprint available on arXiv

The Optimal Feature Selection methodology is extended to classification problems, namely logistic regression and support vector machines. Experiments demonstrate the superior performance compared to alternative methods.

Want to try Optimal Feature Selection?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.